Learning pronunciation dictionary from speech data
نویسندگان
چکیده
In this paper an algorithm and rst results from our investigations in automatically learning pronunciation variations from speech data are presented. Pronunciation dictionaries establish an important feature in state-of-the-art speech recognition systems. In most systems only simple dictionaries containing the canonical pronunciation forms are implemented. However, for a good recognition performance more sophisticated dictionaries including pronunciation variations are essential. The generation of such dictionaries by hand is an extremely time consuming task, and the introduction of errors and inconsistencies is probable. We show an approach for automatically generating suitable pronunciation dictionaries from the speech data base itself, as they are desirable not only for speech recognition tasks but also for speech technology and phonologic research in general. The only knowledge sources besides the data base are the (unlabeled) signals and their transliterations on word level. First experiments yielding promising results have been performed with the software system DataLab [6], which integrates the recognition system of the TU Dresden.
منابع مشابه
Speech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کاملSemi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text
While the performance of automatic speech recognition systems has recently approached human levels in some tasks, the application is still limited to specific domains. This is because system development relies on extensive supervised training and expert tuning in the target domain. To solve this problem, systems must become more self-sufficient, having the ability to learn directly from speech ...
متن کاملAutomatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks
Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation...
متن کاملAutomatic Learning and Optimization of Pronunciation Dictionaries
Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation n...
متن کاملPronunciation prediction with Default&Refine
The Default&Refine algorithm is a new rule-based learning algorithm that was developed as an accurate and efficient pronunciation prediction mechanism for speech processing systems. The algorithm exhibits a number of attractive properties including rapid generalisation from small training sets, good asymptotic accuracy, robustness to noise in the training data, and the production of compact rul...
متن کامل